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ABSTRACT 

An application of the t¥o-paramenter logistic (Rasch) 
model to tailored testing is presanted. The model is discussed along 
with the maximum likelihood estimation of the ability parameters 
given the response pattern ana easiness parameter estimates for the 
items. The technique has been programmed for use vith an interactive 
computer terminal* Ose of the procedure is described in a flexible 
achievement testing setting. Results are presented shoving the number 
of items needed for good estimation. The independence of items used 
and ability estimation is shown. Applications of the system to 
intelligence testing are discussed. (Author) 
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I . Introduction 

Tailored testing (Lord, 19 79) can Ije defined a^ an evaluation 
procedure that attexants to administer a test to an individual 
composed only of itens of an appropriate level of difficulty and 
only as many items as are needed for t^?,G purpose of the ter:5t. ?lost 
currently used pap^r and pencil te^::t3 do not meet these 
o specifications. A fixed set of ite.ns is adininistered to every 
individual regardless of x^hether the it^ias are too hard or too easy 
and every person takes every item. Ideally, a tailored testing 
procedure would select items for each individual fro^n a large item 
pool, possibly adifiinistering a different set of itens to each 
individual. A preset stoppiag rule would terninate a testing 
session when enough infomahion had been gained on an individual, 
possibly adninistering a different numl^er: ' of itans to each 
individual . 

Tailored testing procedures have madc^ their a"r^oear:ince mainly 
in response to problems vrith conventional testing situations. 
These problaras include; inefficient use of examinee ti:ne, liraited 
test feedback, iiaproper level of it.'^i:;. dif f iculti^j^ , adrainistrator 
variables, ansv/er sheet eJfects, tii^.e limits and nany others. 
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Reviews of the litex z'^ure by Lxn?i, T.ock, and Cleary ( 1969) and 
Weiss and B^tz (1973) give extensive coverage to these problems so 
they will not be discussed here* Since tailored tenting procedures 
generally are untimed, can give imadiate feedback, give items of 
appropriate difficulty level and solve many other testing problems, 
they seem the obvious solution to difficulties in tho traditional 
test setting. Current research in the area has yielded many 
different kinds of tailoring procedures under as many different 
names • The purpose of this paper is to present yet another 
tailoring procedure along with a computer program for its 
implementation* The capabilities and limitations of the procedure 
will be explored. 

The model chosen as a basis for the tailored testing procedure 
presented here i^; the ^^asch sinple logistic model (Hasch, 1950) • 
This model is thought to be a natural choice for tailored testing 
because of its siraplicity, the estimation of the ability parameters 
independent of the item parameters, and the estimation of item 
parameters indePiundent of the sample* These properties allow items 
to be calibrated on variou3 different groups yielding comparable 
results, and then using a different set of calibrated items to 
estimate each individual's ability while still yielding ability 
estimates on tlie same scale* The details of the procedure will be 
presented in Section II of thi<3 paper* 

The actual implementation of the tailorc?.d t-^sting employrj the 
capabilities of a time-sharing computer system. Through the use of 
computer terminals test items are adrdnistered, and an interactive 
computer programs has been v;ritten to select items and estimate 
ability parameters as the administration is tajcing place* 
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II • Theor'^tical Pronavorlc 
The tailored testing nroceclure r)re3v:intecl in thin^ pnoer i^ 
based on the Rasch sinple logistic rr.odel (Rasch, 1-^^^0) . This nodol 
states that the probability that person s vrill get iti:r:i i corract 
is a function of two pararaetern : the ability of the person, A^, 
and the easiness of the itera, . .lore s ••)Gcif icall'^^ 



(A E. ) 

SI 



X . 

s.i 



P{X .} = , X . = 0,1, 

1 + A E. SI ' ' 

S 1 



V7here X . = 1 if tiie it^jn is anr^v/ere^ correctly and X , = 0 if the 

SI SI 

item is answered incorrec:tly . Doth parcjnete-^rs , A and H. , rcln:^^? 

frotn 0 to 00 . If A =0, person s has no ability and obviously the 

s ■ 

probability of a corrcict reT>pon33 will be zero. As A aooroachos 
infinity, the probability of. a correct resnonse aonroaches 1.0. If 
3. =0, the probabilitv of a corract resnonse is zero and therefore 
item i is extremely difficult. As annroach-ir, i;..^inity, the 
probability of a correct resT:»onse approache;;^ 1.0 cind hence the ite^ 
is extremely easy. 

This model is a special case of the gen^i):?! three iten 
parameter logistic nodel developed by Birnbaun (1:)68). The three 
parameter modeJ. is given by the following formula ^ 



(1 - c.)e 

p{X . = 1} = c. + i 

SI 1 

1 + e 



a.O^ - b._) 



a,(0^ " b.) 
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where c ^ is a guessing paranefcer, a^ iz a di:?criniin:^tion paraneter, 

is a difficulty para^Aeter, and 0^^ i-:^ the ability parameter for 

individual s. The simr.-le logi3tic r.iodel can bo obtained from the 

0 b. 

above formula by setting c. ^ 0,e = a e ^ = S./ and a. = 1.0. 

From the relationship of the simple logistic luodel to the 

three parameter logistic iuodel, two of the assumntions of simple 

model can easily be determiner!. First, the simpla logistic model 

assumes that the probability of a correct ro^r>onse by guessing is 

zero (c^ = 0) , and second, all of tho iteais are assuraed to be of 

equal discrimination (a. = 1). .^either of the^e a-^oumntions i<:5 

1 

actually justified in practice. Multiple choice items are used for 
the tailored testing procedure so there is a guessing probability 
and unless i::3ms are o-jlected very carefully there v/ill be some 
variation in item discrii.iination. However, Ross (19 66) has found 
that guessing has little effect on the Rasch model and 
Panchapakesan ( 1969) and liambleton (1969) have shown that scrae 
variation in the discrimination parameter v/ill not affect the fit 
of the model. 

Tv/o other a3sump':ions also need to be made \7h.'^n using the 
Ilasch model. First, the model is based on the assuiaption that a 
unidimensional latent trait is beinr measured and second, the modal 
assumes local independence (i.e., for any given person, responses 
to one item in no way effect responses to another item) . These 
last two assumption:i are relatively easily mot with careful test 
construction procedures. 

Once the theoretical model had been decided upon an:*I the 
assumptions had been evaluated to determine applicability of the 
model/ the major problem became the actual ir:iplementation . Ilore 
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specif ically , estimation procecTiaror^ needed to be deternined for the 
cibility and easiness parameters* Several techniques had already 
been developed for the es ti. n/.tion of the pararaeter? on a paner and 
pencil test and these technique-s were r";iadily a^^plicable to the 
calibration of items for use witli the tailorjrl testing progran. 
The currently available tevjhniques can be classified into three 
categories. First there is the original lea.jt squares "eyeball" 
'pproach that Rasch used in his original presenr.ation of the model 
(Rasch, 1960). Second, 3rooks (196^1) used lecist square regression 
techniques to quantify the graphic techniques used by 7.asch and 
finally, Panchapakesan (19G9) developed a raa::imui^ likelihood 
technique that has beon progra^jned for use on a conput:;.r (bright 
and Panchapakesan, 1969) . 

Based on information given in :.'anchapakesan ' s dissertation 
(Panchapakesan, 19G9) , the m-*^-xi^ iU]:i likelihood technique seems to 
yield superior results and hence v/as used for iton calibration in 
this study. The actual computer progran used for calibration was a 
greatly modified version of a progran obtained from Jerry Durovic 
oi the Nev; York State Civil Service Departraent. 

Since the ability parameters of the simple logistic model 
needed to be estimated in real tinie after each iten had been 
administered to an individual, procedures developed for use with 
standard group tests were no longer appropriate, r.s an alternative 
an algebraic maxinuiii likelihood solution was attempted, but 
solution of the necessary equations required finding the roots of 
high order polynomials and hence the algebraic procedure v/as 
dropped. Instead, a computer program v;as written that searched the 
likelihood function for its maxiraim. On trial runs, the progran 
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was found to converge fairly rroidlv' on the raaxinun (anproxirnately 
seven iterationo required) and hjr.ce it v;a9 raasonabld to use the 
procedure for real tivio esti. nation of ability. In practice ^ 
operating system w^*ae lags ^re iauch niore noticeable tlian the time 
required to estimate ability parameters. 

Along v;ith ability paranater e>:;tination, a proceclure was also 
needed to determine a lover bound on the ability estimate to be 
used for classification purposes. Since thv^ likelihood function 
was already being used to estimate ability, the area beneath the 
likelihood function was found u^ing nunerical integration and the 
lower 5% point of this distribution was set as the lovrer bound on 
ability. This procedure is equivalent to a Bayesian procedure 
assuming a rectangular nrior v'ith bounds from zero to on"^ hundred o 

A final theoretical problerA rc^quired a solution before the 
tailored testing procedure could be iriolenc-nt ed. That probleni was 
how the items to be admivxistero.;: to an individual were to be 
chosen. Lord (1970) presented i;iany possible schemes for item 
selection from a fixed stepsiza, up and dov/n raathod to variable 
scepsize Robbins-Munro process. leiss S L.^itz ( 197 3) hav3 also 
presented an extensive sumra^.ry of techniques for iten selection. 

^"*he particular technique chosen for i'..ralementation involves 
first estimating a person's ability parameter and then picking an 
item for administration v;ith easinesr> oarameter gr-riater than or 
equal to the reciprocal of the ability parameter. ':'his procedure 
results in the selection of an it?.m with a transitional difficulty 
index of 53 or easier . If no ability oararieter estimate is 
available an item v;ith easiness paraiiieter 1.00 is selected and a 
fixed step procedure is ur.ed until both correct and incorrect 
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responses have bean obtained. The procedure is discussed in 
greater detail in Reckaso (1974). 



III. ::ethod 

?xS described in the previou'5 sections, the purnose of this 
paper is to present tlie results of a stucv into the practical 
application of tailored testing. In this section the actual data 
collection procedure v/ill be described, including a description of 
the subject sample used. The rGsult:3 prasrinted here are based on a 
pilot study for a more elciborat>ii evaluation that is currently 
being planned. vThila the sample is snail, the data generated give 
valuable information concernincj the usefulness of the tailored 
testing model. 

The sample used for tliis study V7a3 conj>oscd of seventeen Ss 
from a graduate-undergrciduate ia3a3urement course at the University 
of Missouri who volunteered to •:>artici\'>ato in the experiment. The 
Ss ranged from college junioro to 2n:l year graduate students and 
were approximately evenly divided between males and females. 

During the experixaent each S :;a:5 evaluated on an individual 
basis in two v/ays. Vftien an S arrived for the exneri-iental session 
he was first administered a fifty item multiple-choice exam on 
statistics and measurement concepts. The test was administered in 
a small room, to minimize interruptions and distractions, without 
any time limitations. After completing the paper and pencil test, 
the S was taken to a second room containing an IB:i 27^H typewriter 
terminal and signed on to an IB?! 370/165 computer for the tailored 
testing procedure. The program accessed then typed out 
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instructions and quizzed the S for his student nui":\ber and the 
subject code for the subject matter area to be tested. The S 
stayed with the £ until all questions riad been answered and it v;as 
clear that the S understood the operation of the terminal. The 3 
then left the room and the adiainistration becarae solf-paced. The 
testing situation continued until a decision was rc^ached or until 
the item pool v/as depleted. The instructions to the 3 and a sample 
item are shown in Figure 1. ^ 

The paper and pencil test used as a pretest v/as calibrated on 
250 students in an undergraduate measurement course using the 
maximum likelihood program developed by '-/right and Panchapakesan 
(1969) described in Section II of this paper. Ulio subject matter 
area, statistics and measurement concepts, v;as chosen because the 
greatest nuiTiber of items were available in the tailored testing 
item pool in that area. Forty statistics and measurement items had 
been stored in the tailored testing data sot for use and were 
available for this study. The iteras in the item pool were 
calibrated on 253 to 9 66 students from an undergraduate measurement 
c arse over a period of two years. Details of the item storage 
format are given in Reckase (197^) . 

From tlie two testing situations described aioove the following 
data were gathered on each S including (1) the ravr score on the 
paper and pencil test, (2) the corresponding ability estimate, (3) 
the letter grade classification for the subject on the test, (4) 
the final ability estimate ba,5ed on the tailored tosting procedure, 
(5) an estimate of the lower limit on the ability estimate, (6) the 
letter grade assigned, and (7) the nurober of iteiis administered by 
the tailored testing procedure. These measures were then analyzed 
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to answer three questions. First, do the tailored testing 
procedure and . the paper and pencil te^c yield coraparable ability 
estiniate: second, do the tailored testing procedure and the paper 
and pencil test classify the Ss in the sarae v/ay as to letter grade: 
and third, how many items were needed by the tailored testing 
procedure to converge on an cibility estiraate? 

In order to answer the experiaental questions the following 
statistical procedures were used. To corapare the ability estimates 
obtained by the two techniques, both the Pearson Product moment and 
the Snearman Rank correlation were connuted since the scale 
properties of the ability scales are unclear. The similarity of 
classification was determined using Kendall's x statistic (Siegel, 
19 56) and the nuxnber of items neoded for convergence is shown by 
the distribution of the values and suiTvriary statistics including tho 
mean and standard deviation. These results v/ith the rav/ data are 
presented in the next section. 

IV. rcesults 

The data for the analysis comparing: the ability estiiaates 
obtained using each of the methods are shown in Table 1 along with 
the other measures obtained in the study. The Pearson Product 
Moment correlation between the two sets of ability estimates is 
0.61 and Spearman rank order correlation is 0.73. If the ability 
estimates are on a ratio or interval scale, the former value is 
more appropriate, if the scale assuraptions are not met the rank 
order is more appropriate. In order to interpret these 
correlations, the reliabilities of each of th-^, procedures is 
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requirede The KR-20 reliability of the papar and pencil test is 
available ami ii3 0.72, but no d:\ta is available on tlie reliability 
of the tailored testing procedure. Both of the correlation 
coef fi'jientr. are significent. beyond the 0.05 level. Summary 
statistics ovi ti.e ability estLnatv.?s including the mean, median and 
standard deviation are given in Table 2. There are no significant 
differen'Sts between any of these statistics. 

Tba similarity of th(:i grade classifications of the t^-ro methods 
is sumriarized i;\ the two-way table given in Table 3. Ax 
statistic showing similcirity has Leen computed on this data 
yielding a value of 0..57, This statistic is significantly 
different from zero beyond the 0.001 level. 

The results concorning the nuinbtur of items required to 
classify a S into a grade citegory are shv>wn in Table 4. Given are 
a frequency distribution for the number of items needed and the 
following descriptive stati.jtics: the mear., median, mode, standard 
deviation, and range. F::om these data it can be seen that the 
distribution is highly nositively skewed with a median value of 
t\ -ive. This should be compared to th^ ?ifty items used on the 
paper and pencil test. 

V, Discussion 

Interpreting the results of this study is soiaewhat problematic 
because it is difficult to decide v;hat results are desireable. 
Should the tdilored testing procedure ideally yield ability 
estimates and grade classification identical to those obtained by 
the less than perfect paper alid pencil test or should the ability 
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estimate be difforent, reflecting a parhans "better" tailored 
testing procedure? It seens that at the very le-^.st there should be 
some similarity between the methods nince they are trying to do the 
same thing; but if the tailored testing procedure is more accu3%ate, 
the similarity should not be too great in light of 0.72 reliability 
of the paper and pencil test. 

The correlational data obtained in this study shov; that the 
two methods yield similar results, but that the ability estimates 
are far from equal and the grade classi£icc.t:ions are the same for 
only two-thirds of the Ss. This leaves onen the possibility that 
the tailored testing procedure is an improveiaent over the 
traditional test, but needless to say, several other possibilities 
are available to explain the moderate correlations. 

First;, the item administration program terminates the testing 
session once a grade of A has bean obtained* This occurred after 
as few as six items had been adminiatered, which is hardly an 
adequate number for good estimation. If the administration of 
items had been allov/ed to continue after the a:^signnent of an A 
g.ade, more accurate estimates v/ould probably have been obtained 
and the agreement of the estimates vrith those of the paper and 
pencil test might have been better. 

A second source of error in the esti^nation of ability is in 
the number and quality of items in the item pool. The item pool 
used for this study containe'.". only forty items, some of which were 
of poor quality. Recent simulation ^studies seem to indicate that 
about 250 items are required for good estimation. If the item pool 
is small, the simulations shov; that the procedurec^ used in this 
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study will tend to overestinata ability. In light of the simulated 
results r the forty item pool seerar> to have done anasingly well. 

Another difficulty related specifically to the grading 
procedure is the technique for esci.aating the lower lirait of 
ability. The lower 5!^ point on the likelihood distribution is at 
best a rough indication of lower linit. Bayesian procedures based 
on various different fanilies of prior distributions are cu?:rently 
being studied as alternatives. 

Although the procedure is beset with the problems just 
described^ the end result has for the most part been positive. The 
procedure has been shown to work and nost of the problens discussed 
can be overcome by reprogramming and by increasing tlie size of the 
item pool. 

A more positive outcome of this study is the determination of 
the number of items required to classify 3_i5 into grade categories. 
As is shov/n in Table 4, the distribution is highly positively 
skewed with a median value of twelve items. This is a substantial 
reduction from the fifty items in the standard tG3t^ although 
aa^ainist ration of the test is slov7er since each item is typed out 
during the testing session. The tiae needed for the administration 
of thirty tailored items is about equal to the time needed for the 
fifty item paper and pencil test. The use of faster cathod ray 
terminals will greatly improve test administration time. 

In sumi:\ary, tlie tailored testing procedure described in this 
paper has been shov;n to yield similar, but not equivali^nt, results 
to those of a conventional test in both the estimation of ability 
parameters and the asigmaent of letter grades. These results are 
obtained using substantially fewer items than the conventional test 
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while administering different items to each individual. Problems 
encountered in the operation of the procedure were also discussed 
including the size of the item pool, li^nits on the ability 
estimates, and problems vrith the stop^-)ing rule. Overall, the 
procedure has been shov/n to be a viable tailored testing method , 
worthy of further research and refinement. 
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TERI-1IMAL TESTI^JG PROCEDURE 

YOU WILL BE PRESENTED ;7ITK A SERIES OF TEST ITS IS. 'ilESPOiJD TO 
EACH ITEil BY TYPIIJG THE APPROPRIATE LETTER AIJD PRESSIIT-^ THE 
l^TURN KEY. ITE"IS '7ILL BE PRESENTED U'TTIL A CLEAR DECISIOII 
IS REACHED CONCERNING VIHETHER YOU ARE ABOVE OR BELO'7 A C GPAD7J. 
IP YOU 'JI3H TO C01\'TINU3 021 FOR A ?IIG!r:R GPADE, IliSTRUCTIOrT? 
WILL BE GIVEN AT THAT POINT. IF AT AiIY TI Ci:. YOU /ISH TO STOP 
BEFORE A DECISION HAS BEEN IIADE, TYPE THE "ORD STOP -APTSR 
YOUR LETTER RESPOl-ISE ?::jD PRESS THE PJCTURl^T V^Y . 



PLEASE TYPE YOUR STUDENT NU>S3R AND THE RETUR'-T KEY 

IF YOUR STUDENT NUIIBER CONTAINS ONLY 5 DIGITS 
START IT mUll A LEADING ZERO TO i'lAIS SIX DIGITS 



100000 

INPUT: ID = 100000 

TYPE THE CODE CORRESPONDING TO THE APJ:;A YOU APO:! TO BE TESTED ON 

S-'l FOR STATISTICS :-'u.ID i'EASURE lENT 

ET FOR CLASS.^OOM EVALUATION TECHNIQUES 

ST FOR 3TA2JDARDIZED TESTS 
AFTER TYPING THE PROPER CODE, PRESS THE RETUrJ'T KEY 
sra 

INPUT: TEST CODE = S:i 



1 

A PSYCHOLOGIST WHO I.JA.TTS A : HASUP.E OP THE EXTENT TO 'JHICH 
SCORES IN A GROUP VARY IIGHT C') ICEIVABLY CHOOSE ANY ONE OF TvIE 
FOLLOWING EXCEPT 

(A) THE R/uIGIJ. 

(3) THE VARIANCE. 

(C) TH^ STANDARD DEVIATION. 

(D) THE ; CD I AN. 

TYPE PxESPONSE LETTER AND PRESS P^lJTURl/ 



d 

CORRECT 
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Table 2 



Sunnary vStati sties 
on Ability Estinates 

Paper and Pencil Tailored 

Test Test 

Mean 12.62 11.99 

Median 6.330 11.516 



Standard 
Deviation 



13.00 10.55 



Reliability 
KR-20 0.72 

Correlation 

Pearson 

Product 0.61* 
Moment 

Spearman p 0.73** 



* p <0.05 
** p <0.01 



t 



.1 



Table 3 

Sinilarity of Grade Classification 

•tailored Test Classification 





A 


B 


c 


D 


A 


s 

. _ 


1 

. . . 


1 


0 


Paper and 

B 

Pencil 


2 


0 




0 


Classification 


0 


0 


2 


1 


D 


0 


0 


0 


1 



Kendall's t = 
*** p <0.001 



0.57*** 



Table 4 



jummary of the Number of Items Required 
for Tailored Testing Procedure 

Frequency Distribution 

lumber of Items Frequency 
Required 



1-5 



6 - lU 


8 


11 - 15 


3 


16 - 20 


1 


21-25 


1 


26 - 30 


2 


31 - 35 


1 


3G " -10 


1 



M = 17 

Mean = 15.59 

Median = 12.00 

Mode = 6.00 

Standard = 10.14 
Deviation 

Range = 6-39 
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